Distribution free bounds for relational classi cation
نویسندگان
چکیده
Statistical Relational Learning (SRL) is a sub-area in Machine Learning which addresses the problem of performing statistical inference on data that is correlated and not independently and identically distributed (i.i.d.) { as is generally assumed. For the traditional i.i.d. setting, distribution free bounds exist, such as the Hoe ding bound, which are used to provide con dence bounds on the generalization error of a classi cation algorithm given its hold-out error on a sample size of N . Bounds of this form are currently not present for the type of interactions that are considered in the data by relational classi cation algorithms. In this paper we extend the Hoe ding bounds to the relational setting. In particular, we derive distribution free bounds for certain classes of data generation models that do not produce i.i.d. data and are based on the type of interactions that are considered by relational classi cation algorithms that have been developed in SRL. We conduct empirical studies on synthetic and real data which show that these data generation models are indeed realistic and the derived bounds are tight enough for practical use.
منابع مشابه
Generalization bounds for incremental search classi cation algorithms
This paper presents generalization bounds for a certain class of classi cation algorithms. The bounds presented take advantage of the local nature of the search that these algorithms use in order to obtain bounds that are better than those that can be obtained using VC type bounds. The results are applied to well-known classi cation algorithms such as classi cation trees and the perceptron.
متن کاملModerated Class membership Interchange in Iterative Multi relational Graph Classi er
Organizing information resources into classes helps signi cantly in searching in massive volumes of on line documents available through the Web or other information sources such as electronic mail, digital libraries, corporate databases. Existing classi cation methods are often based only on own content of document, i.e. its attributes. Considering relations in the web document space brings bet...
متن کاملAlgorithms and Applications for Universal Quanti cation in Relational
Queries containing universal quanti cation are used in many applications, including business intelligence applications and in particular data mining. We present a comprehensive survey of the structure and performance of algorithms for universal quanti cation. We introduce a framework that results in a complete classi cation of input data for universal quanti cation. Then we go on to identify th...
متن کاملRisk bounds for Statistical Learning
We propose a general theorem providing upper bounds for the risk of an empirical risk minimizer (ERM).We essentially focus on the binary classi cation framework. We extend Tsybakovs analysis of the risk of an ERM under margin type conditions by using concentration inequalities for conveniently weighted empirical processes. This allows us to deal with other ways of measuring the sizeof a clas...
متن کاملData Mining using Nonmonotonic Connectionist Expert Systems
An application of Nonmonotonic Connectionist Expert Systems (NCESs) in mining classi cation rules from large relational databases is presented. NCESs are hybrid learning systems that can acquire symbolic knowledge of a nonmonotonic domain, represented using nonmonotonic inheritance networks. This initial knowledge can be re ned using connectionist learning techniques and a set of classi ed exam...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008